This tutorial is an introduction to dealing with tidy spatial networks in R, demonstrating a full process of data acquisition from the open spatial database OpenStreetMap, data preparation, and basic network analysis like isodistance and shortest path calculation.
To follow this tutorial, you will have to have a number of packages available, which can be best sorted out with the following command:
install.packages(c("tidyverse", "sfnetworks", "tmap", "osmdata"))
The main package this tutorial centres around is the sfnetworks package, which is the result of joining what the tidygraph package does for networks with what the sf package does for spatial vector data.
The data structure central to this package is a combination of two spatial objects, one describing the nodes (or points) and another describing the edges (or lines connecting the points). In the words of the developers:
“A close approximation of tidyness for relational geospatial data is two sf objects, one describing the node data and one describing the edge data.”
In other words, a sfnetwork object is made up of one
sf object for nodes and one sf object for
edges. (As opposed to two simple data.frames for a
tidygraph object.)
A visual explanation of sf objects by Allison
Horst (copyright hers): a dataframe with sticky geometries in a list
column. The geometries are “sticky” because they don’t vanish when, say,
you select() columns.
In the edges component of a network, it is also possible (or
necessary) to specify where the edge starts and finishes, using the
from and to columns. Such information can be
interpreted as the direction of the edge in a “directed” network, or
simply as its two extremities in the case of an “undirected”
network.
Finally, another important characteristic of the edges is their weight. This can relate to any kind of information, but if we are interested in distances between points, it will typically be the length of the edges.
Although the structure of an sfnetwork seems to be quite
complex, it is possible – and useful! – to use a shortcut and create one
from a simple collection of linestrings.
We can download download data from a spatial vector database like OpenStreetMap (OSM), focus on
the lines (or “ways”) and convert that sf object into an
sfnetwork that will use the lines as edges, and their
endpoints as nodes.
Let’s first download from OSM all the ways that are primary used by pedestrians in the suburb of West End. (To understand how these features are tagged in the database, a good starting point is the Map Features page on the OSM wiki.)
library(osmdata)
## Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
# build the query
we_foot <- opq("west end, meanjin") %>%
add_osm_features(features = c('"highway"="footway"',
'"highway"="steps"',
'"foot"="yes"',
'"highway"="living_street"')) %>%
osmdata_sf()
The osmdata package allows building Overpass queries to download OSM vector data that matches our criteria:
Features tagged as highway=footway, including
footpaths, crossings…
Features tagged as highway=steps, for
stairs
Features tagged as foot=yes, for example bikeways on
which foot traffic is allowed (see this shared
bikeway for an example)
Features tagged as highway=living_street, a shared
road on which pedestrians often have right of way
Finding the right combination of OSM tags to use is always an iterative process, refining the selection at each step.
We choose to return an sf object, which might contain
points, lines and polygons. We are only interested in the lines:
names(we_foot)
## [1] "bbox" "overpass_call" "meta"
## [4] "osm_points" "osm_lines" "osm_polygons"
## [7] "osm_multilines" "osm_multipolygons"
we_foot_lines <- we_foot$osm_lines
We can visualise the resulting object with the sf package:
library(sf)
## Linking to GEOS 3.6.2, GDAL 2.2.3, PROJ 4.9.3; sf_use_s2() is TRUE
we_foot_lines %>%
# st_crop(west_end_sf) %>%
st_geometry() %>%
plot()
It seems to have kept a ferry route, which was tagged with
foot=yes and route=ferry on OSM. We can remove
it:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
we_foot_lines <- we_foot_lines %>%
filter(is.na(route))
Another caveat in our process is that some footpaths are mapped as closed ways on OSM, and are therefore returned as a polygon:
we_foot$osm_polygons %>%
st_geometry() %>%
plot()
The smaller ones are proper pedestrian areas, but the bigger ones are probably footpaths encircling a whole residential block. It is possible to “cast” these polygons as ways and include them in the network:
# cast polygons to lines
poly_to_lines <- st_cast(we_foot$osm_polygons, "LINESTRING")
# bind all lines together
library(dplyr)
we_foot_lines <- bind_rows(we_foot_lines, poly_to_lines)
This is what we are left with:
# plot it
we_foot_lines %>%
st_geometry() %>%
plot()
We can now convert the object to an sfnetwork object,
making sure we set it as undirected (the default is
directed = TRUE):
library(sfnetworks)
## Warning: It seems that you are using an old version of PROJ. If you use the built-in roxel dataset, please remember to reassign its CRS, i.e. sf::st_crs(roxel) = sf::st_crs("EPSG:4326"), before running any example.
foot_net <- as_sfnetwork(we_foot_lines, directed = FALSE)
plot(foot_net)
The current coordinate system for the network is a global one, EPSG:4326.
st_crs(foot_net)
## Coordinate Reference System:
## User input: EPSG:4326
## wkt:
## GEOGCRS["WGS 84",
## ENSEMBLE["World Geodetic System 1984 ensemble",
## MEMBER["World Geodetic System 1984 (Transit)"],
## MEMBER["World Geodetic System 1984 (G730)"],
## MEMBER["World Geodetic System 1984 (G873)"],
## MEMBER["World Geodetic System 1984 (G1150)"],
## MEMBER["World Geodetic System 1984 (G1674)"],
## MEMBER["World Geodetic System 1984 (G1762)"],
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ENSEMBLEACCURACY[2.0]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["Horizontal component of 3D system."],
## AREA["World."],
## BBOX[-90,-180,90,180]],
## ID["EPSG",4326]]
st_transform() allows us to transform the coordinates to
a different projection. For this part of the world, a recent Projected
Reference System is EPSG:7856 (or
“GDA2020 / MGA zone 56”).
foot_net <- st_transform(foot_net, 7856)
If the command above gives you a GDAL error, reassign the original
CRS first: st_crs(foot_net) <- 4326
As mentioned before, a common weight associated to each edge of the network is the edge’s length. We can add this weight to our network, but we need to first “activate” the part of the object we want to modify:
foot_net <- foot_net %>%
activate("edges") %>%
mutate(weight = edge_length())
Using st_as_sf(), we can extract the components of an
sfnetwork object and use them in a familiar plotting system
like ggplot2:
library(ggplot2)
ggplot() +
geom_sf(data = st_as_sf(foot_net, "edges"),
mapping = aes(colour = as.numeric(weight))) +
labs(colour = "Edge length (m)")
sfnetworks and tidygraph include many pre-processing and cleaning
functions for graphs, some of them detailed in this
article. One example of pre-processing is the removal of nodes that
have only two edges connected, also called “smoothing of pseudo-nodes”.
We need to combine tidygraph’s convert() function with a
“spatial morpher” function from sfnetworks:
library(tidygraph)
##
## Attaching package: 'tidygraph'
## The following object is masked from 'package:stats':
##
## filter
foot_simple <- convert(foot_net, to_spatial_smooth)
plot(foot_simple)
One more relevant example for this dataset is the subdivision of edges: because some edges have interior nodes that are endpoints of other edges, they will not be connected to each other when analysing the network.
We again use a combination of convert() with the
relevant spatial morpher, to_spatial_subdivision():
foot_net <- convert(foot_net, to_spatial_subdivision)
The network should now contain more edges.
More information about “spatial morphers”, including what options
exist for dealing with attributes when multiple features are merged, is
available in the documentation: ?spatial_morphers
At this point, it might be interesting to create an interactive visualisation to zoom into, which might help spot issues with the data.
library(tmap)
tmap_mode("view") # set to interactive mode
tm_tiles("CartoDB.Positron") +
tm_shape(st_as_sf(foot_net, "edges")) +
tm_lines(col = "footway", palette = "Accent", colorNA = "red") +
tm_shape(st_as_sf(foot_net, "nodes")) +
tm_dots()
There are a few islands in the network, in which not many points are connected to each other. We can remove those, by:
local_size() functionfoot_net <- foot_net %>%
activate(nodes) %>%
mutate(neighbourhood = local_size(order = 6)) %>%
filter(neighbourhood > 5)
Let’s now draw an isodistance around the Kurilpa Library in West End. First, download the library’s feature from OSM, and calculate its centroid (because it is mapped as a building):
kurilpa_lib <- opq_osm_id(id = 523925261, type = "way") %>%
osmdata_sf() %>%
.$osm_polygons %>%
st_centroid() %>%
st_set_crs(4326) %>% # (if the following step generate GDAL error)
st_transform(crs = 7856)
Then, calculate the isodistance for a distance smaller or equal to
2000 metres (more or less a half-an-hour walk), using the
node_distance_from() function from tidygraph:
foot_net <- activate(foot_net, "nodes")
iso <- foot_net %>%
dplyr::filter(node_distance_from(st_nearest_feature(kurilpa_lib, foot_net), weights = as.numeric(weight)) <= 2000)
Finally, draw a polygon around the isodistance, and plot everything:
iso_poly <- iso %>%
st_geometry() %>%
st_combine() %>%
st_convex_hull()
plot(foot_net, col = "grey")
plot(iso_poly, col = NA, border = "black", lwd = 3, add = TRUE)
plot(iso, col = "lightgreen", add = TRUE)
plot(kurilpa_lib, col = "red", pch = 8, cex = 2, lwd = 2, add = TRUE)
Let’s now calculate the shortest distance from the South Brisbane Sailing Club to the library.
# get the location of the Sailing Club's entrance
sailing_club <- opq_osm_id(id = 7622867925, type = "node") %>%
osmdata_sf() %>%
.$osm_points %>%
st_set_crs(4326) %>% # (if the following step generate GDAL error)
st_transform(crs = 7856)
# calculate the shortest path
shortest <- foot_net %>%
activate(edges) %>%
st_network_paths(from = sailing_club, to = kurilpa_lib)
# extract the node IDs
node_path <- shortest %>%
slice(1) %>%
pull(node_paths) %>%
unlist()
# only keep the network for these nodeIDs
path_sf <- foot_net %>%
activate(nodes) %>%
slice(node_path) %>%
st_as_sf("edges")
# visualise
tm_tiles("CartoDB.Positron") +
tm_shape(path_sf) +
tm_lines(col = "red") +
tm_shape(sailing_club) +
tm_dots(col = "blue", size = 0.1, text = "Sailing club") +
tm_shape(kurilpa_lib) +
tm_dots(col = "green", size = 0.1, text = "Library")
The granularity of the OSM data available for this area allows us to
create a precise path for pedestrians, switching sides of roads when a
crossing is reached. However, every new analysis of the network might
reveal more interesting information about the data we have used. In this
example, some private residential paths (usually blocked by gates) have
been used, and avoiding those would mean extra steps in pre-processing
(for example using existing OSM tags like barrier=gate and
access=private.
Apart for illustrations in which the copyright is mentioned, this article is released under a CC-BY 4.0 licence. All data used for the data visualisations use OpenStreetMap data, which is © OpenStreetMap contributors but released under an ODBL licence.